A critique of word similarity as a method for evaluating distributional semantic models

نویسندگان

  • Miroslav Batchkarov
  • Thomas Kober
  • Jeremy Reffin
  • Julie Weeds
  • David J. Weir
چکیده

This paper aims to re-think the role of the word similarity task in distributional semantics research. We argue while it is a valuable tool, it should be used with care because it provides only an approximate measure of the quality of a distributional model. Word similarity evaluations assume there exists a single notion of similarity that is independent of a particular application. Further, the small size and low inter-annotator agreement of existing data sets makes it challenging to find significant differences between models.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Finnish resources for evaluating language model semantics

Distributional language models have consistently been demonstrated to capture semantic properties of words. However, research into the methods for evaluating the accuracy of the modeled semantics has been limited, particularly for less-resourced languages. This research presents three resources for evaluating the semantic quality of Finnish language distributional models: (1) semantic similarit...

متن کامل

DISSECT - DIStributional SEmantics Composition Toolkit

We introduce DISSECT, a toolkit to build and explore computational models of word, phrase and sentence meaning based on the principles of distributional semantics. The toolkit focuses in particular on compositional meaning, and implements a number of composition methods that have been proposed in the literature. Furthermore, DISSECT can be useful to researchers and practitioners who need models...

متن کامل

On the use of distributional models of semantic space to investigate human cognition

Huettig et al. (2006) demonstrated that corpus-based measures of word semantics predict language-mediated eyemovements in the visual world. These data, in conjunction with the evidence from other tasks, is strong evidence for the psychological validity of corpusbased semantic similarity measures. But can corpus-based distributional models be more than just good measures of semantic similarity? ...

متن کامل

Capturing Discriminative Attributes in a Distributional Space: Task Proposal

If lexical similarity is not enough to reliably assess how word vectors would perform on various specific tasks, we need other ways of evaluating semantic representations. We propose a new task, which consists in extracting semantic differences using distributional models: given two words, what is the difference between their meanings? We present two proof of concept datasets for this task and ...

متن کامل

An Artificial Language Evaluation of Distributional Semantic Models

Recent studies of distributional semantic models have set up a competition between word embeddings obtained from predictive neural networks and word vectors obtained from count-based models. This paper is an attempt to reveal the underlying contribution of additional training data and post-processing steps on each type of model in word similarity and relatedness inference tasks. We do so by des...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016